A Classification Tree Approach to Automatic Segmentation of Japanese Compound Sentences
نویسندگان
چکیده
It is well known that direct parsing of a long Japanese compound sentence is extremely difficult. Various pre-processing methods have been proposed to segment such a sentence into shorter, simpler ones prior to parsing. The problem with the conventional methods is that some kind of segmentation patterns or heuristic preference scores must be given manually, hence no guarantee for optimality. This paper proposes a new method of sentence segmentation based on a classification tree technique. In this method, optimal segmentation patterns and the optimal order of their application are automatically acquired from training data, linguistic phenomena together with their occurence frequencies being taken into account. Generation of a classification tree is conducted on an EDR corpus, and evaluation results are reported. It is shown that pruning reduces the tree size by a factor of about 1/4 without affecting the performance.
منابع مشابه
Plant Classification in Images of Natural Scenes Using Segmentations Fusion
This paper presents a novel approach to automatic classifying and identifying of tree leaves using image segmentation fusion. With the development of mobile devices and remote access, automatic plant identification in images taken in natural scenes has received much attention. Image segmentation plays a key role in most plant identification methods, especially in complex background images. Wher...
متن کاملObject-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest
This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...
متن کاملAutomatic road crack detection and classification using image processing techniques, machine learning and integrated models in urban areas: A novel image binarization technique
The quality of the road pavement has always been one of the major concerns for governments around the world. Cracks in the asphalt are one of the most common road tensions that generally threaten the safety of roads and highways. In recent years, automated inspection methods such as image and video processing have been considered due to the high cost and error of manual metho...
متن کاملA Lexicalized Tree Adjoining Grammar for Thai
This paper describes an alternative formalism for Thai syntax parsing based on a lexicalized tree adjoining grammar (LTAG). We first briefly present some formal background concerning LTAG, which is necessary for an understanding of LTAG and its application to Thai. Specifically, we address several issues regarding difficulties in parsing Thai sentences and how to resolve these issues using LTAG...
متن کاملAutomatic Bunsetsu Segmentation of Japanese Sentences Using a Classification Tree
Bunsetsu, which is comprised of a content word followed by, possibly 0, function words, is a convenient unit for dependency structure analysis of Japanese. There are, however, no spaces indicating bunsetsu boundaries in the orthographic writing of Japanese. Thus a sentence must be segmented into bunsetsu's by some means prior to dependency structure analysis. Conventionally, such segmentation h...
متن کامل